Transcriber: a Free Tool for Segmenting, Labeling and Transcribing Speech
نویسندگان
چکیده
This paper describes the first version of “Transcriber”, a tool for segmenting, labeling and transcribing speech. It is developed under Unix in the Tcl/Tk script language with extensions in C, and is available as free software. The environment offers the basic functions necessary for segmenting, labeling and transcribing long duration signals. The signal editor and the text editor are integrated and synchronized in order to display and play the current segment. The output is in a standard SGML format. Multiple languages are supported. The tool can be ported to various platforms and is very flexible so that new functions can be easily added. We hope that such a portable, widely available and flexible tool will benefit the whole community and make it easier to develop and share corpora.
منابع مشابه
Transcribing with Annotation Graphs
Transcriber is a tool for manual annotation of large speech files. It was originally designed for the broadcast news transcription task. The annotation file format was derived from previous formats used for this task, and many related features were hard-coded. In this paper we present a generalization of the tool based on the annotation graph formalism, and on a more modular design. This will a...
متن کاملLabeler agreement in transcribing korean intonation with K-toBI
This paper reports labeler agreement in the transcription of Korean prosody using Korean ToBI (K-ToBI) [9]. Twenty utterances representing five different types of speech were produced by 18 speakers and transcribed by 21 labelers differing in their levels of experience with K-ToBI. Following the stringent metric used for English ToBI evaluation [14,12], consistency was measured in terms of the ...
متن کاملTranscriber: Development and use of a tool for assisting speech corpora production
We present ``Transcriber'', a tool for assisting in the creation of speech corpora, and describe some aspects of its development and use. Transcriber was designed for the manual segmentation and transcription of long duration broadcast news recordings, including annotation of speech turns, topics and acoustic conditions. It is highly portable, relying on the scripting language Tcl/Tk with exten...
متن کاملTranscribing continuous speech using mismatched crowdsourcing
Mismatched crowdsourcing derives speech transcriptions using crowd workers unfamiliar with the language being spoken. This approach has been demonstrated for isolated word transcription tasks, but never yet for continuous speech. In this work, we demonstrate mismatched crowdsourcing of continuous speech with a word error rate of under 45% in a large-vocabulary transcription task of short speech...
متن کاملA Computer Assisted Speech Transcription System
Current automatic speech transcription systems can achieve a high accuracy although they still make mistakes. In some scenarios, high quality transcriptions are needed and, therefore, fully automatic systems are not suitable for them. These high accuracy tasks require a human transcriber. However, we consider that automatic techniques could improve the transcriber’s efficiency. With this idea w...
متن کامل